Markov Decision Models with Weighted Discounted Criteria

نویسندگان

  • Eugene A. Feinberg
  • Adam Shwartz
چکیده

We consider a discrete time Markov Decision Process with innnite horizon. The criterion to be maximized is the sum of a number of standard discounted rewards, each with a diierent discount factor. Situations in which such criteria arise include modeling investments, production, modeling projects of diierent durations and systems with multiple criteria, and some axiomatic formulations of multi-attribute preference theory. We show that for this criterion for some positive there need not exist an-optimal (randomized) stationary strategy, even when the state and action sets are nite. However,-optimal Markov (non-randomized) strategies and optimal Markov strategies exist under weak conditions. We exhibit-optimal Markov strategies which are stationary from some time onward. When both state and action spaces are nite, there exists an optimal Markov strategy with this property. We provide an explicit algorithm for the computation of such strategies and give a description of the set of optimal strategies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constrained Markov Decision Models with Weighted Discounted Rewards

This paper deals with constrained optimization of Markov Decision Processes. Both objective function and constraints are sums of standard discounted rewards, but each with a diierent discount factor. Such models arise, e.g. in production and in applications involving multiple time scales. We prove that if a feasible policy exists, then there exists an optimal policy which is (i) stationary (non...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Second Order Optimality in Transient and Discounted Markov Decision Chains

Abstract. The article is devoted to second order optimality in Markov decision processes. Attention is primarily focused on the reward variance for discounted models and undiscounted transient models (i.e. where the spectral radius of the transition probability matrix is less then unity). Considering the second order optimality criteria means that in the class of policies maximizing (or minimiz...

متن کامل

Chapter for MARKOV DECISION PROCESSES

Mixed criteria are linear combinations of standard criteria which cannot be represented as standard criteria. Linear combinations of total discounted and average rewards as well as linear combinations of total discounted rewards are examples of mixed criteria. We discuss the structure of optimal policies and algorithms for their computation for problems with and without constraints.

متن کامل

Weighted Discounted Stochastic Games with Perfect Information

We consider a two-person zero-sum stochastic game with an innnite time horizon. The payoo is a linear combination of expected total discounted rewards with diierent discount factors. For a model with a countable state space and compact action sets, we characterize the set of persistently optimal (sub-game perfect) policies. For a model with nite state and action sets and with perfect informatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Math. Oper. Res.

دوره 19  شماره 

صفحات  -

تاریخ انتشار 1994